home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Meeting Pearls 4
/
Meeting Pearls Vol. IV (1996)(GTI - Schatztruhe)[!].iso
/
Pearls
/
dev
/
C-Lib
/
APurify
/
Doc
/
MIT-APurify.doc
< prev
next >
Wrap
Text File
|
1996-01-07
|
26KB
|
607 lines
MIT-APurify v1.3
----------------
MIT-syntax version (GCC).
(c) by Samuel DEVULDER
jan. 1996
Samuel.Devulder@info.unicaen.fr
DESCRIPTION (SHORT):
--------------------
This is APurify for compilers with MIT syntax asm-files. As far as I
know only GCC uses such a syntax. So that version is indeed a version
for the GCC compiler. If you are using an other compiler, then read
MOT-APurify instead. In the following of that document, APurify stands
for MIT-APurify, and I assume you're using the GCC compiler.
APurify is a program that allows you to detect bad accesses to memory
of your programs without any kind of specific external devices (MMU).
It avoids bugs due to accessing memory not owned by your program.
INSTALLATION:
------------
That archive contains the version of APurify for the GCC compiler
as well for other compilers. Here is a description of gcc-related files
of this archive for that version. It also gives you what to do with
those files to make an installation.
- doc/MIT-APurify.doc The file you are currently reading. Put it
with all your doc files. It is usefull from
time to time.
- doc/History The whole history. (this file is not very
usefull for common people). Do whatever you
want with it.
- bin/MIT-APurify The parser tuned for the MIT syntax. Rename
it as APurify and put it someware in your
path.
- lib/APur-gcc.a The link-time library. Rename it as APur.a
and put it someware in your library
search-path.
- test/test.c Source of a stupid test file. Just here to
let you remake the test program. Do
whatever you want with it.
- test/test.gcc Test file Apurify'ed. Run it to see how
APurify is useful :-).
SYNOPSIS:
--------
Usage: APurify [-revinfo] <inputfile> [options]
Where options can be:
? To display this usage
-h To display this usage
-? To display this usage
-tb To test memory referenced through base register
-ts To test memory referenced through stack register
-tl To test memory referenced through local stack frame
-tp To test pea instructions
-o arg Specifies output file (def=%s)
-br arg Sets the base register (def=A4)
-mp arg Sets the main entry-point (def=_main)
Options can be anywhere on the command line. NOTE: They can nomore be
merged together, they must be separated by a space. You can pre-define
them with the environment variable AP_MITP_OPT. For example, if you do:
CLI> SetEnv AP_MITP_OPT "-tb -br A5"
Then, when "-tb -br A5" will automatically be added to the command
line. The space between an option and its argument can be ommited. Thus
"-br A4" is the same as "-brA4". Here is a description of arguments and
flags:
-revinfo This displays informations about APurify (name, size and
date of modules and number of compilation done for that
version).
-br arg This sets the base register used to reference memory
in SMALL_DATA model. Usually A4 is used for that perpose
and that's the default. If A5 is used instead then add
-brA5 on your command line.
-tb This enable APurify to check all referenced memory through
the base register (see -br). If you are using a SMALL_DATA
model, add this flag on your command line. By default,
APurify won't check memory referenced through the base
register.
NOTE: for safest check, you should always use that option,
even if you're not in smalldata model (A4 may be used as
a temporary register in that case). To allow this, you can
use the environment variable.
-ts This enable APurify to check memory referenced by stack
pointer (SP or A7). By default APurify won't check such
memory accesses (to reduce the code size and increase the
runtime speed). That option will detect when you have no
more room on your stack (stack overflow).
-tl This enable APurify to check memory referenced by local
stack pointer (the one that is link'ed and unlink'ed when
enterring and exiting a C-function). By default, this is
switch off. This option allow APurify to detect stack
overflow.
-tp This enable APurify to check indirect adresses pushed onto
the stack by using a pea. By default this is off. When
used, that option will check things like "pea a2@(10)" or
the like. This can help you with memory accessed by a
pointer in a code that has not been APurify'ed. For example
this is usefull for things like fread(&ptr[10],10,1,fp)
because in that case the "pea a2@(10)" used to push on the
stack &ptr[10] will be checked and if ptr[10] is not owned
by your program, you'll get an APurify error. Please note
that this may no work all the time since &ptr[0] can be
translated as "movel a0,sp@-" which won't be checked.
-o arg This specifies the name of the outputfile. If ommited the
outputfile will be the same as the inputfile (source file).
The name of the output file can be defined by a real name
or a pattern. A pattern is a string where special sequences
of characters (called specifier) are replaced by special
strings. Let's suppose that inputfile is equal to
drive:path/file.ext
Here is a description of specifiers:
%s will be replaced by the full source name:
drive:path/file.ext
%S will expand to the full source name without the
extension:
drive:path/file
%b stands for the full basename:
file.ext
%B is a shortcut for the full basename without the
extension:
file
%p is the path (ending "/" or ":" is included):
drive:path/
%e is the extension ("." is ommited):
ext
Thus, if you put "-o ram:%B-apurify.%e" in the commandline,
then the outputfile will be "ram:file-apurify.ext" with
our example.
-mp arg This tells APurify which label should be considered as the
entry-point. By default it is set to "_main", and it should
not be modified.
-?
-h
? Obvious options.
DESCRIPTION (A BIT LONGER):
--------------------------
As a general rule, at the microprocessor level, there is two kind
of ways to access memory. There is direct access and indirect access to
memory. For example, in C, direct access can be viewed as accessing to
global variables. Indirect access corresponds to accessing an array
value. More precisely, direct access corresponds to reading or writing
a variable whose address is known at compilation time (or since the
loading of the program into the memory). Indirect access is used for
variables whose adress is dynamicaly determined by the program. For
example, if p is a pointer to an array allocated by malloc(), *p is an
indirect access. Such an access occur also in case of instruction like
T[i] where T is a global array, because the address of T[i] is not
known at compilation time, since it depends on the index value i. Using
indirect access to memory is called indirection.
A regular program must not access memory not owned by it. That kind
of access can be qualified as illegal.
Illegal direct access to memory is not possible, because by
definition, only global variables can be accessed that way and those
variables belongs obviously to the program (except for code written in
assembly language that references absolute values, for example:
"btst #6,$bfe001"; but that kind of code is not a good programming
:-)). So we can assume that direct access to memory is always right.
On the other hand, it is sure that indirect access to memory can
be illegal. Many bugs are made by overstepping array boundaries. If
that oversteppings are in reading a value, there is not much trouble
for over running tasks (it is an error inside your task); but if it is
in writing you may directly interfere with other tasks and big mess can
happen (total breakdown of the system).
APurify works on that kind of access by verifying the validity of
indirect access to memory. It remebers the memory that was allocated by
the program and check the integrity of each access. One can think that
makes a lot of tests ! Well, yes, but APurify is not designed to be
used in the general use of programs; just in test phases. Moreover,
indirections do no occur very often actually. Only array-based
variables produces indirections. Thus, the variables on the stack
--although being accessed by indirection-- are not checked because
their access is always safe (at least if there is no stack overflow !).
Also, in SMALL_DATA model, global variables access is done through
indirection, but they are not checked.
If an illegal access is found, APurify displays an error message on
the error stream of the program by default. There is two kind of
illegal accesses. Some are accesses to memory that doesn't belong to
the program (it is called an access between blocks), some others are
accesses to a part of memory owned by a program and an other part not
owned by it (it is an overstepping of a block). You can see this
visually: If [ 1 ] and [ 2 ] represent two blocks allocated by the
program and ( 3 ) the memory accessed, then
---- [ 1 ] ---- ( 3 ) ---- [ 2 ] ---->
0 increasing address
corresponds to the first kind of illegal access and
---- [ 1 ( ] 3 ) ---- [ 2 ] ----->
or
---- [ 1 ] ---- ( 3 [ ) 2 ] ----->
corresonds to the second kind of access. The first kind is very common
but the second is quite rare (it's rather a misaligment problem).
APurify has two output modes. One is verbose an tries to give lot
of informations by using words. The other one is more brief and gives
you the same informations but you'll have to decode them.
When APurify starts and ends, it outputs the date/time. This is
useful if you are using logfiles. With that, you can keep all your logs
in a single file and retrieve any execution with it's date of
execution.
In case of an error, APurify displays some text. The first line
looks like this one:
**** APURIFY ERROR ! [$<N1>(<N2>) <ATTR> (<TEXT1>)] <TEXT2>:
That line represent the accessed memory. <N1> is the hexadecimal
address accessed. <N2> is the length of the access (in decimal). <ATTR>
represents the type of acess. <TEXT1> allows you to find where in your
code the illegal accessed had happened. <TEXT2> describe the kind of
illegal access.
If the length (<N1>) is 1, then it was a byte access. 2 stands for
a short access, 4 for a int/long and >4 for movem instruction.
Attributes, <ATTR>, can be "R--" or "-W-". The first one represents an
access in reading a value and the second an access in writing a value.
The text <TEXT1> look like this:
<NAME>, PC=$<PC#> HUNK=$<HUNK#> OFFSET=$<OFF#>
<NAME> is the name of the subroutine where the error occured. It is
always displayed (even if it is a "static" one). The rest of the line
can be partially displayed, showing as much informations as APurify can
get. <PC#> is a hexadecimal address pointing to the instruction that
produced the error. <HUNK#> and <OFF#> are the hunk number and the
relative offset of <PC#>. Using <HUNK#> and <OFF#> and a disassembler,
you can very easilly find where your code is bad (BTW, I use dobj from
netdcc, (c) by Matt Dillon). Please note that in this new version,
<PC#> will nomore point to some instruction before the faultly one. It
is always the real faultly adress.
The remaining lines show the context of the illegal access. It
gives you informations about the surronding memory blocks owned by
your program. Each block is displayed according to the following
pattern:
[$<N1>(<N2>) <ATTR> (<TEXT>)]
where <N1> is the hexadecimal address of the beginning of the block,
<N2> its length (in decimal). Note that the length may seem to be
longer than the one allocated by malloc() and the address may point
before the one you obtained via malloc(). This is not wrong ! In fact
you must know that the malloc() subroutine may add some informations
(like an double-chained list or the length of the allocation) to the
block you've requested. Those extra informations are put before the
address you recieve. That explain this behavior. In this version of
APur.lib, this takes 12 ($C) extra bytes. So if you allocate 10 bytes,
don't be suprised if APurify thinks you've requested 22 bytes.
<ATTR> are 3 status characters RWS
where R means: read-enable block
W means: write-enable block
S means: system block (block not controlled by the program).
If one access is forbidden, the letter '-' replaces the corresponding
character. <TEXT> is actually the name of the procedure that has
allocated the block.
With each block you can find an offset. That offset is the distance
between that block and the faultly address. In verbose mode, you can
see some text explaining things about the relative position of a block
and the accessed memory. In non-verbose mode you can just see the
offsets followed by the blocks. The shorter offset is displayed first
since that block is the one that is more likely overstepped.
When an illegal writing occur (the only dangerous thing you can do
by indirection, indeed), a requester opens to tell you about that. With
that requester, you can stop your program to prevent the deadly error
to really happen. If you wish so, exit() is called. You can also
ignore that error or ignore all such errors (but then you'll surely
meet the guru !).
APurify checks the memory allocated but not freed by the program.
(in fact, it detects non deallocated-blocks on library-closing time).
It knows about memory location independant of the program
execution. That is to say, the first kilobyte of memory that contains
interrupt vectors of the 680x0 processor, the program segments and the
stack. Accessing to those blocks will be illegal. They got the S
attribute (for SYSTEM blocks).
It takes into account memory block allocated by malloc() and
AllocMem(), and indirect allocated block (by OpenScreen() for example).
But I did not test the last kind of allocation. Anyway, it should be
ok, because APurify patches AllocMem() & FreeMem() entries. Thus a
program can access to the bitplanes of one of its screen without error.
If the program makes a legal access, but attributes are
incompatible with the access-kind, a protection-error message is
displayed. Actually only the first kilobyte is read/write-protected.
But it may change in the future.
HOW TO USE APURIFY:
------------------
One can see APurify as a pre-assembler. It must be used on assembly
language sourcefile just before the assembler takes place. It scan the
file and change it a bit so that APur.a can be used.
Normal way to use it for a C program is to:
- compile C sourcefiles and leave assembly language source (.s).
- use APurify on each .s file.
- compile your .s file to get a .o file
- link all .o files together with APur.a.
For example, using gcc on prog.c it gives
CLI> gcc -g prog.c -o prog.s -S
CLI> APurify -tb prog.s
CLI> gcc -g prog.s -o prog -lAPur
As you can see, APurify needs no change to your C files to be used.
In this realease you need no more to call AP_Init() in the main()
function. The call is automatically inserted when the main-entry label
(specified by -mp) is found. You shoud not use dos.library/Exit() to
abort your program, I think it'll crash if APurify is running. If you
must use Exit() then call AP_Close() just before calling Exit(). The
explantion is simple: since some system functions are patched, if a
program exits without closing the library, those patch will be
corruped, pointing to a code that is nomore in memory and you'll meet
the guru (ie: the computer will crash)... (You've been warned :-).
You can disable/enable printing of messages by making a call to
AP_Report(flag). If flag is true (ie. different from zero) then
printing is enabled, if it is false (ie. equal to zero), no output will
be done. This is usefull for startup-codes. For example, if you are
using the argv[] array in C, APurify will make a lot of false-error
printing. This is because the values pointed by this array is allocated
before the library is opened. You can avoid this by calling
AP_Report(0) before (and AP_Report(1) after) the code that uses argv[].
When debugging an APurify'ed program, you can put a breakpoint on
a function called AP_Err(). That function AP_Err() is called each time
APurify detects an error. With that, you'll have the occasion to look
at your program just before a faultly memory-access occur.
You can switch from a verbose output to a shorter one with
AP_Verbose(flag). IF flag is true then the verbose mode is on. If it is
false then only short messages will be printed. Some people prefer the
later so that is the default. If you perfer the verbose ouput then put
AP_Verbose(1) someware in your code and you'll get some longer
explanations about illegal accesses.
You can specify a logfile where APurify can put its errors. To do
this, set the environment variable "APlog" (file ENV:APlog) to a name
of a logfile. If this variable is set, then APurify will append all its
outputs to the file indicated. If this variable does not exists, then
the standard error stream is used.
EXAMPLE:
-------
As an example, let's look at the test program compiled with
gcc-2.6.0. You'll see how you can use the APurify report it produces to
find what's wrong in the program. For this, I've included in that
document the commented report. My comments/explanations appear on lines
beginning with a "#".
**** APurify started on Thu Jan 4 23:03:58 1996
#
# Well, the report started...
#
**** APURIFY ERROR ! [$0026defc(4) R-- (_main, PC=$0027eef0 HUNK=$0
OFFSET=$410)] accessed between:
-25 [$0026df18(27) RW- (_main)]
+1405 [$0026d920(96) RWS (segment Module CLI)]
#
# Hum... First hit... it is an error in reading something in the main()
# procedure between two blocks already allocated. The nearest block
# appears in first position, so we can think that the error was done by
# accessing an array allocated in main() with a negative index. We can
# look at the code to find what is wrong with it. Using DOBJ, we found
# at offset $410 in the first hunk the following code:
#
# 00.00000410 24ab ffd8 MOVE.L -40(A3),(A2)
#
# This corresponds to the C code:
#
# a[0]=b[-10]
#
# Hence we've discovered a first error in the code. Note that -25 is
# the distance (in bytes) between the end of the accessed memory and
# the beginning of the array. This is not the difference between the
# beginning address of the two blocks!
#
**** APURIFY ERROR ! [$00245f20(4) R-- (_main, PC=$0027ef1a HUNK=$0
OFFSET=$43a)] accessed between:
+1 [$00245f10(16) RW- (_main)]
-162301 [$0026d920(96) RWS (segment Module CLI)]
#
# Well... here it seems to be an access just after an allocated block.
# the offset +1 is the distance in bytes between the accessed block and
# a allocated block. The situation is like this:
#
# ---------[ 1 ]( 2 )---------->
#
# Where "[ 1 ]" is the allocated block and "( 2 )" the accessed block.
# If we look in the code, we find:
#
# 00.0000043a 4aaa 0004 TST.L 4(A2)
#
# that correponds to the test done by "if(a[1] == 0)". This is an error
# since the array 'a' is just 16-12=4 bytes long. So a[1] points out of
# the array!
#
**** APURIFY ERROR ! [$00245f1e(4) R-- (_read_shifted, PC=$0027ed9e
HUNK=$0 OFFSET=$2be)] accessed across the ending boundary of:
-2 [$00245f10(16) RW- (_main)]
#
# Hehe another error... Damn ! That test program is a FULL of bug !
# Yes, but that one is an other kind of error. It is an access across a
# boundary. That occur in the read_shifted() code. We need not look in
# the asm file to see the error. Here it is a misaligment error.
# Visually that gives:
#
# ------------[ 1(]2 )----------->
#
# [ 1 ] = allocated ( 2 ) = accessed.
#
**** APURIFY ERROR ! [$00245f1c(4) R-- (_read_long, PC=$0027edce
HUNK=$0 OFFSET=$2ee)] accessed between:
-162305 [$0026d920(96) RWS (segment Module CLI)]
+2382621 [$00000000(1024) --S (Basic 680x0 vectors)]
#
# That error is strange! It is not an access to an array with a
# negative index as one think immediately: We never call read_long() in
# such a way, and the offsets are too big ! Indeed, the accessed memory
# was right some times ago since it lays in the array 'a' (look at the
# second hit). Hence, it must be an access to a free()'d memory. That
# error is then obviously found in the code:
#
# free_arg(a); read_long(a).
# ^^^^^^^^^^^^
#
**** APURIFY ERROR ! [$00000004(4) R-- (_read_page_zero, PC=$0027ee32
HUNK=$0 OFFSET=$352)] accessed on a read-protected block:
+4 [$00000000(1024) --S (Basic 680x0 vectors)]
#
# Here the error is obvious, were are reading the zero-page. If it was
# in writing, that error would be very dangerous.
#
**** APURIFY WARNING ! Closing library without deallocation of the
following block(s):
- [$00271540(412) RW- (_main)]
- [$00287070(12012) RW- (_main)]
- [$0032e2c0(40012) RW- (_main)]
#
# The program has exit()ed. APurify tells us that we've forget to free
# those blocks. It is a case of memory leak. Those blocks were
# allocated in main(). Those were allocated and lost by
#
# a=malloc(4),malloc(400),malloc(12000),malloc(400000)
#
# since the assignment only affects the first item of ",,,".
#
**** APurify ended on Thu Jan 4 23:04:00 1996
#
# Well... done :-).
#
LEGAL PART:
----------
That program is provided 'AS IS'. I am not responsible for any
dammage it can cause (but I am responsible for the benefits it can give
to you :-). Use that software at you own risks.
That program is FREEWARE. You can use and distribute it as long as
you keep the archive intact (no adulteration of files except for
compression). It can't be sold without my agreement (except a minimal
amount for media support). You must ask me for commercial use of (any
part of) that product. I keep all my rights on that program and its
future releases. I can modify that software without telling it to the
users.
If you wish, you can send me a postcard or anything else you want
(money, documentation, amiga, hardware stuff, ...) in exchange for
using APurify. But there is no obligation :-). My postal address is:
M. DEVULDER Samuel
1, Rue du chateau
59380 STEENE
FRANCE
(yes I'm french !). You can send suggestions or bugs to my email
address:
devulder@info.unicaen.fr
NOTES:
-----
It has been compiled with cross-gcc 2.7.0 with libnix on a Sun
sparc.
I had the idea of that program after a chat with Cedric BEUST
(AMIGA NEWS) on IRC (Internet Relay Chat). Thanks Cedric !
I wish to thank Philippe Brand for his help in my port. I also wish
to thank J.C Hoehle for his usefull advices.
All marks are proprietary of their respective owners.
There are some programs like APurify. For example, FORTIFY (Simon
P. Bullen), but it only detects illegal writes to boundaries of
allocated blocks. Thus it can't detect big oversteps and oversteps in
reading and the detection is not real-time. Enforcer can detect illegal
access to memory, but it needs a special device (MMU).
HINTS & TIPS:
------------
You can see some memory leaks with that version of APurify. It is
not really good but it can help. Memory leak occur when a block of
memory is nomore pointed by your program. Those memory blocks will
necessary be displayed when your program exit()s. So with all the
messages printed on that occasion, you can find such blocks. I known
this is not so great, but I think it can help you a little bit (maybe
in a future version I'll build some code to really check memory leaks).
BUGS:
----
APurify don't known public memory where a program can read or write
without having allocated it. Thus, it will report an error when a
program reads or writes values in a message obtained through GetMsg()
calls. Use AP_Report() to avoid such reports.
It can display messages about closing the library without freeing
some memory blocks. This is due to printf() that allocates memory that
is free'd on exit. This is not a real bug, but you can avoid this by
doing a AP_Report(0) just before exiting. But you must notice that it
is better to display false bugs than to not display real ones.
I've rewritten malloc()/realloc()/free(). I hope this will not
produce bugs (I've tested sucessfully the test program with libnix and
ixemul, so I hope it will be all right).
Certainly more bugs, but I'm waiting for your bug-reports.